Number Theoretic Transforms

7 March 2024 (updated 7 March 2024)

$$ \gdef\delim#1#2#3{\mathopen{}\mathclose{\left#1 #2 \right#3}} \gdef\p#1{\delim({#1})} \gdef\F{\mathbb F} \gdef\i{\mathrm i} \gdef\mod#1{\delim[{#1}]} \gdef\M{\mathsf{\small M}} \gdef\A{\mathsf{\small A}} \gdef\Mc{\mathsf{\small M^c}} \gdef\Ac{\mathsf{\small A^c}} \gdef\NTT{\mathsf{\small NTT}} \gdef\CCV{\mathsf{\small CCV}} $$

NTTs, FFTs, convolutions, multiplication, interpolation, Chinese remainder theorem, Lagrange basis, Representation theory, FRI. It's all the same thing.

https://en.wikipedia.org/wiki/Fourier_transform_on_finite_groups#Relationship_with_representation_theory

To do. Prime length FRI using Rader?

A lot of research on them has been done in DSP. But there the more sophisticated methods are not as popular because

Exchanging muls for adds is not very worthwhile.
Numerical accuracy is challenging.

Neither of these holds in finite fields.

Introduction

$$ \gdef\ω{\text{ω}} \gdef\MF{\mathrm{F}} $$

The number theoretic transform (NTT) of a sequence $x_i$ of length $n$ over a field $\F$ is defined as a sequence $f_k$ of length $n$ by

$$ \begin{align} f_k = \sum_{i∈[0,n)} x_i ⋅ \ω_n^{i⋅k} \end{align} $$

where $\ω_n$ is a primitive $n$-th root of unity. When $\F$ is the complex numbers the roots are $\ω_n = \mathrm{e}^{\frac{2⋅\mathrm{π}⋅\mathrm{i}}{n}}$ and the transform is called the discrete Fourier transform (DFT) and its implementations fast Fourier transform (FFT). In a prime order finite field $\F_p$ with generator $g$ the roots are $\ω_n = g^{\frac{p-1}{n}}$. Note that the roots are not unique and $\ω_n^k$ is also a valid $n$-th root of unity whenever $k$ and $n$ a coprime.

The roots are just numbers and satisfy all the algebraic properties such that $\ω_n^0 = 1$ and $\ω_n^{a + b} = \ω_n^a⋅\ω_n^b$. In addition they have some unique properties such as $\ω_n^n = 1$, $\ω_{k⋅n}^k = \ω_n$ and $\sum_{i∈[0,n)} \ω_n^i = 0$.

$$ \begin{align} \sum_{i∈[0,n)} \ω_n^{a⋅i} = \begin{cases} n & a = 0 \\ 0 & a ≠ 0 \end{cases} \end{align} $$

From these it follows that $\ω_2 = -1$, which is why the $n=2$ case is particularly efficient:

$$ \begin{aligned} f_0 &= x_0 + x_1 & f_1 &= x_0 - x_1 \end{aligned} $$

NTT is Almost it's own Inverse

The inverse of the NTT is

$$ x_i = \frac 1n \sum_{k∈[0,n)} f_k ⋅ \ω_n^{-i⋅k} $$

Proof

$$ \begin{aligned} \frac 1n \sum_{k∈[0,n)} f_k ⋅ \ω_n^{-i⋅k} &= \frac 1n \sum_{k∈[0,n)} \p{\sum_{j∈[0,n)} x_j ⋅ \ω_n^{j⋅k}} ⋅ \ω_n^{-i⋅k} \\ &= \frac 1n \sum_{k∈[0,n)} \sum_{j∈[0,n)} x_j ⋅ \ω_n^{j⋅k-i⋅k} \\ &= \frac 1n \sum_{j∈[0,n)} x_j \sum_{k∈[0,n)} \ω_n^{\p{j-i}⋅k} \\ &= \frac 1n \sum_{j∈[0,n)} x_j \p{\begin{cases} n & i = j \\ 0 & i ≠ j \end{cases}} \\ &= \frac 1n x_i ⋅ n \\ &= x_i \\ \end{aligned} $$

The factor of $\frac 1n$ is constant can be applied to the output (as in the sum above) or brought inside the sum and applied to the input. It can even be applied together with the $\ω$ factors to save operations, but this requires a bit more care to correctly propagate through the various algorithms presented below.

The other difference with a regular $\NTT$ is the factor of $-1$ in the exponent of $\ω$. Since for each $\ω_n' = \ω_n^{-1}$ is also a primitive $n$-th root of unity we can use the regular algorithm with a different root. Instead of changing the root, we can also substitute $i' = \mod{-i}_n$. This has the effect of reversing the order of all but the $0$-th index element in the output sequence. By symmetry with $k$ this can also be done on the input elements.

NTT is Matrix Multiplication

This equation can also be seen as a multiplying the vector $\vec x$ by the DFT matrix $\MF_n$ where $(\MF_n)_{ij} = \ω_n^{i⋅j}$, i.e.

$$ \begin{bmatrix} f_0 \\ f_1 \\ f_2 \\ ⋮ \\ f_{n-1} \end{bmatrix} = \begin{bmatrix} ω^0 & ω^0 & ω^0 & ⋯ & w^0 \\ ω^0 & ω^1 & ω^2 & ⋯ & w^{n -1}\\ ω^0 & ω^2 & ω^4 & ⋯ & w^{2⋅(n-1)}\\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ω^0 & ω^{n-1} & ω^{2⋅(n-1)} & ⋯ & ω^{(n-1)^2} \\ \end{bmatrix}⋅ \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ ⋮ \\ x_{n-1} \end{bmatrix} $$

The matrix is symmetric, $\MF_n^\mathsf{T}=\MF_n$, and has inverse $(\MF_n^{-1})_{ij} = \frac 1n ⋅ \ω_n^{-i⋅j}$. It is also a Vandermonde matrix on the basis $(\ω_n^0,…, \ω_n^{n-1})$.

NTT is Polynomial Evaluation

We can also interpret the NTT equation as evaluating a polynomial $P∈\F[X]$ with coefficients $x_i$ in points $\ω_n^k$ as

$$ \begin{aligned} P(X) &= \sum_{i∈[0,n)} x_{i} ⋅ X^i &\text{and}&& f_k = P\p{\ω_n^k} \end{aligned} $$

NTT is Polynomial Interpolation

If we interpret $x_i$ as a set of values of some polynomial $P∈\F[X]$ on the basis $(\ω_n^0,…, \ω_n^{n-1})$. Then we can construct the polynomial using Lagrange interpolation.

$$ P(X) = \sum_{i∈[0,n)} x_i ⋅ L_{n,i}(X) $$

where the Lagrange polynomials $L_{n,i}(X)$ are

$$ L_{n,i}(X) = \prod_{j∈[0,n)\setminus i} \frac{X - \ω_n^j}{\ω_n^i - \ω_n^j} $$

NTT is Chinese Remainder Theorem

Another way of evaluating a polynomial at $x$ is by reducing it modulo the monomial $(X - x)$, which leads to

$$ \begin{aligned} f_k = \mod{P}_{\p{X - \ω_n^k}} \end{aligned} $$

The product of all these moduli is a Cyclotomic polynomial

$$ \prod_{k∈[0,n)} \p{X - \ω_n^k} = X^n - 1 $$

If we know two factors $n = a⋅b$ then

$$ X^n - 1 = \p{X^a - 1}⋅\p{X^b - 1} $$

$$ X^{a + b} - X^a - X^b + 1 $$

NTT is a ring isomorphism

The vector space $\F^n$ forms a ring under vector addition and Hadamard product. The $\NTT$ is a ring isomorphism between $\F^n$ and the ring $\F[X]/\p{X^n - 1}$. The fastest known algorithms for multiplying in $\F[X]$ work by finding a sufficiently large $n$ to lower the problem to $\F[X]/\p{X^n - 1}$ and then use $\NTT$ to transform to $\F^n$, apply the Hadamard product, and transform back.

If we know two factors $n = 2 ⋅ m$ then

$$ X^n - 1 = \p{X^m - 1} ⋅ \p{X^m + 1} $$

$$ \p{X^m - 1} ⋅ \p{X^m + 1} = X^{2⋅m} - 1 $$

Using the Chinese remainder theorem we can reconstruct $P$ from these remainders

$$ \begin{aligned} M &= \prod_{i∈[0,n)} m_i & M_i &= \frac{M}{m_i} & \mod{x}_M = \mod{ \sum_{i∈[0,n)} \frac{M}{m_i} ⋅ \mod{x⋅\frac{m_i}{M} }_{m_i} }_M \end{aligned} $$

$$ \begin{aligned} \mod{P}_{\p{X^n -1}} = \mod{ \sum_{i∈[0,n)} \frac{\p{X^n -1}}{\p{X - \ω_n^i}} ⋅ \mod{P⋅\frac{\p{X - \ω_n^i}}{\p{X^n -1}} }_{\p{X - \ω_n^i}} }_{\p{X^n -1}} \end{aligned} $$

$$ \begin{aligned} P = \sum_{k∈[0,n)} f_k⋅\prod_{k∈[0,n)} \p{X - \ω_n^k} \,\operatorname{mod}\, \prod_{k∈[0,n)} \p{X - \ω_n^k} \end{aligned} $$

The Cooley-Tukey algorithm

If $n = a ⋅ b$ we can rewrite $i = i_a + i_b ⋅ a$ with $i_a∈[0,a)$ and $i_b∈[0,b)$. This creates a correspondence $[0,n) ≃ [0,a)×[0,b)$. An alternative correspondence is $k = k_b + k_a ⋅ b$. If we substitute these in the definition of NTT we get:

$$ f_{k_b + k_a ⋅ b} = \sum_{i_a∈[0,a)}\sum_{i_b∈[0,b)} x_{i_a + i_b ⋅ a} ⋅ \ω_n^{\p{i_a + i_b ⋅ a}⋅\p{k_b + k_a ⋅ b}} $$

Expanding the exponent of $\ω_n$ we get

$$ \ω_n^{\p{i_a + i_b ⋅ a}⋅\p{k_b + k_a ⋅ b}} = \ω_n^{i_a ⋅ k_b} ⋅ \ω_n^{i_b ⋅ k_b ⋅ a} ⋅ \ω_n^{i_a ⋅ k_a ⋅ b} ⋅ \ω_n^{i_b ⋅ k_a ⋅ a ⋅ b} $$

Using $\ω_{l⋅m}^l = \ω_m$ we get $\ω_n^a = \ω_b$ and $\ω_n^b = \ω_a$, combined with $\ω_n^{a⋅b} = \ω_n^n = 1$ this simplifies the exponent to

$$ \ω_n^{\p{i_a + i_b ⋅ a}⋅\p{k_b + k_a ⋅ b}} = \ω_b^{i_b ⋅ k_b} ⋅ \ω_n^{i_a ⋅ k_b} ⋅ \ω_a^{i_a ⋅ k_a} $$

Substituting this back in

$$ f_{k_b + k_a ⋅ b} = \sum_{i_a∈[0,a)}\sum_{i_b∈[0,b)} x_{i_a + i_b ⋅ a} ⋅ \ω_b^{i_b ⋅ k_b} ⋅ \ω_n^{i_a ⋅ k_b} ⋅ \ω_a^{i_a ⋅ k_a} $$

This allows us to introduce some brackets

$$ f_{k_b + k_a ⋅ b} = \sum_{i_a∈[0,a)} \delim[{ \p{\sum_{i_b∈[0,b)} x_{i_a + i_b ⋅ a} ⋅ \ω_b^{i_b ⋅ k_b}} ⋅ \ω_n^{i_a ⋅ k_b} }]⋅\ω_a^{i_a ⋅ k_a} $$

We can now see that the sum in parenthesis is an NTT of size $b$ operating over subsequences of $x$. This then gets multiplied by an extra factor $\ω_n^{i_a ⋅ k_b}$, known as a twiddle factor. The outer sum over the square brackets is again an NTT, this time of size $a$. This is the core of the Cooley-Tukey algorithm.

It's instructive to visualize $x$ as a $b×a$-sized matrix in row-major order:

$$ x = \begin{bmatrix} x_0 & x_1 & x_2 & ⋯ & x_{a-1} \\ x_a & x_{1+a} & x_{2+a} & ⋯ & x_{(a-1) + a} \\ x_{2⋅a} & x_{1 + 2⋅a} & x_{2 + 2⋅a} & ⋯ & x_{(a-1) + 2⋅ a} \\ ⋮ &⋮ &⋮ & ⋱ & ⋮ \\ x_{(b-1)⋅a} & x_{1 + (b-1)⋅a} & x_{3 + (b-1)⋅a} & ⋯ & x_{(a-1) + (b-1)⋅a} \end{bmatrix} $$

We then transform all the columns in this matrix using a $b$-sized NTT. To apply the twiddle factors, these can also be written as a $b × a$ matrix with $W_{ij} = \ω_n^{i⋅j}$ which can then be applied by a pointwise product

$$ W = \begin{bmatrix} 1 & 1 & 1 & ⋯ & 1 \\ 1 & \ω & \ω^2 & ⋯ & \ω^{a-1} \\ 1 & \ω^2 & \ω^4 & ⋯ & \ω^{\p{a-1}⋅2} \\ ⋮ &⋮ &⋮ & ⋱ & ⋮ \\ 1 & \ω^{b-1} & \ω^{2⋅\p{b-1}} & ⋯ & \ω^{\p{a-1}⋅\p{b -1}} \\ \end{bmatrix} $$

The next step is to transform all the rows using a $a$-sized NTT. All the values are now computed, but they are not yet in the correct order. The output order, $f_{k_b + k_a ⋅ b}$, corresponds to a $a×b$ sized matrix in row-major order. Fortunately it is just the transpose of the matrix we have computed.

In summary the steps are

Reinterpreted as a $b×a$ matrix in row-major order.
Apply $b$-sized NTTs to all the columns in $x$
Multiply each element by its twiddle factor $\ω_n^{i⋅j}$.
Apply $a$-sized NTTs to all the rows.
Transpose the matrix to have the output in row-major order.

The first and fifth step are commonly omitted and the algorithm is known as the four-step NTT or Bailey's FFT. It has the implementation drawback that the NTTs in the second step are over values discontinuous in memory. We can solve this by doing two more transpositions after steps 1 and 2 to get the six-step NTT

If the factors $a$ and $b$ of $n$ are again composite the Cooley-Tukey algorithm can be applied recursively. A common strategy is to pick a highly composite $n$ such as $n = 2^k$ and recursively split it. There are several strategies to do this. If we make $a$ as small as possible, it is called decimation-in-time (DIT). If instead we make $b$ as small as possible it is decimation-in-frequency (DIF). In both cases the small base case is known as the radix. It is also common to refer to NTT algorithms by their radix as in 'radix-4 NTT'. Highly optimized small NTTs to be used as base case are known as codelets. These are the subject of ongoing research but this is mostly focussed on the complex numbers. If we make $a$ and $b$ about the same size it is called a radix-$\sqrt{n}$. The six-step radix-$\sqrt{n}$ algorithm is a cache-oblivious algorithm when combined with an cache-oblivious matrix transpose. This makes it particularly suitable for very large $n$. Finally for complex numbers an variant called split-radix recurses into a size $\frac n2$ and two $\frac n4$ NTTs.

The Good-Thomas algorithm

Starting again with a factorization $n = a⋅b$, let's look at all the bijections between $[0,n)$ and $[0,a)×[0,b)$ of the form

$$ \begin{aligned} i &= \mod{i_a ⋅ e_a + i_b ⋅ e_b}_n \\ k &= \mod{k_a ⋅ e_a' + k_b ⋅ e_b'}_n \\ \end{aligned} $$

where $e$'s are parameters and $\mod{-}_n$ denotes reduction modulo $n$. We can substitute this into the $\ω_n$ exponent again. Note that modular reduction is already implied by the root of unity.

$$ \ω_n^{\p{i_a ⋅ e_a + i_b ⋅ e_b}⋅\p{k_a ⋅ e_a' + k_b ⋅ e_b'}} = \ω_n^{i_a ⋅ k_a ⋅ e_a ⋅ e_a'} ⋅ \ω_n^{i_b ⋅ k_b ⋅ e_b ⋅ e_b'} \ω_n^{i_a ⋅ k_b ⋅ e_a ⋅ e_b'} ⋅ \ω_n^{i_b ⋅ k_a ⋅ e_b ⋅ e_a'} ⋅ $$

In Cooley-Tukey we used the solutions $e_a = 1$, $e_b = a$, $e_a' = b$, $e_b' = 1$. This turned the first two roots into $\ω_a$ and $\ω_b$ and got rid of the $i_b⋅k_a$ term. We where left with a $i_a⋅k_b$ term that turned into twiddle factors. Can we get rid of this term? The necessary conditions for this are

$$ \begin{aligned} \mod{e_b ⋅ e_a'}_n &= 0 &\hspace{2em} \mod{e_a ⋅ e_b'}_n &= 0 \\ \mod{e_a ⋅ e_a'}_n &= b &\hspace{2em} \mod{e_b ⋅ e_b'}_n &= a \end{aligned} $$

We also need some conditions to make sure the map is a bijection.

In Bur77 it is shown that the map is a bijection iff

$\gcd(a, b) = 1$ and
- $\gcd(e_a, a) = \gcd(e_b, b) = 1$ and at least one of
  - $e_a = α ⋅ a$
  - $e_b = β ⋅ b$
$\gcd(a, b) > 1$ and either
- $e_a = α ⋅ a$ and $e_a ≠ β ⋅ b$ and $\gcd(α, a) = \gcd(e_b, b) = 1$
- $e_a ≠ α ⋅ a$ and $e_a = β ⋅ b$ and $\gcd(e_a, a) = \gcd(β, b) = 1$

The simplest solution to this set of constraints is

$$ \begin{aligned} e_a &= b &\hspace{2em} a_b &= a \\ e_a' &= \mod{b^{-1}}_a⋅b &\hspace{2em} e_b' &= \mod{a^{-1}}_b⋅a \end{aligned} $$

The inverses only exist if $\gcd(a,b)=1$, i.e. we have a factorization of $n$ into co-prime factors. It is shown in Bur77 and others that all solutions require a co-prime factorization. Proceeding as before we get

$$ f_{\mod{k_a ⋅ e_a' + k_b ⋅ e_b'}_n} = \sum_{i_a∈[0,a)} \p{\sum_{i_b∈[0,b)} x_{\mod{i_a ⋅ e_a + i_b ⋅ e_b}_n} ⋅ \ω_b^{i_b ⋅ k_b}} ⋅ \ω_a^{i_a ⋅ k_a} $$

We have a clean separation into two NTTs of size $a$ and $b$ without twiddle factors. The order of the two NTTs does not matter. This is the Good-Thomas or Prime Factor NTT algorithm. It can also be visualized as a matrix, but this time the input and output mapping is no longer a simple reinterpretation in row-major order but a more complex permutation.

Our optimal strategy is now to first use Good-Thomas recursively to break down $n$ into co-prime factors. These factors of the form $p^n$ for some prime $p$ can be further reduced using Cooley-Tukey which leaves us with prime-sized NTTs as base cases.

Bluestein algorithm

Start again with the NTT formula

$$ \begin{aligned} f_k = \sum_{i∈[0,n)} x_i ⋅ \ω_n^{i⋅k} \end{aligned} $$

and substitute the identity

$$ i⋅k = \frac{1}{2}⋅\p{i^2 + k^2 - (i - k)^2} $$

in the exponent to get

$$ \begin{aligned} \ω_n^{i⋅k} &= \ω_n^{\frac{1}{2}⋅\p{i^2 + k^2 - (i - k)^2}} \\ &= \ω_{2⋅n}^{i^2 + k^2 - (i - k)^2} \\ &= \ω_{2⋅n}^{i^2}⋅\ω_{2⋅n}^{k^2}⋅\ω_{2⋅n}^{-(i - k)^2} \\ \end{aligned} $$

We use the fact that $\ω_n^{\frac 12} = \ω_{2⋅n}$ if $\ω_{2⋅n}$ exists. Substituting this back in the sum and moving factors around we get

$$ \begin{aligned} f_k &= \ω_{2⋅n}^{k^2}⋅\sum_{i∈[0,n)} \p{x_i ⋅ \ω_{2⋅n}^{i^2}} ⋅ \ω_{2⋅n}^{-(i - k)^2} \end{aligned} $$

This formula has the shape

$$ \begin{aligned} f_k &= a_k ⋅ \sum_{i∈[0,n)} b_i ⋅ c_{i-k} \end{aligned} $$

Which is a convolution between $b$ and $c$ followed by pointwise multiplication by $a$.

$$ \begin{aligned} a_i &= \ω_{2⋅n}^{i^2} &\hspace{2em} b_i &= x_i ⋅ \ω_{2⋅n}^{i^2} &\hspace{2em} c_i &= \ω_{2⋅n}^{-i^2} & \end{aligned} $$

We have

$$ c_{i+n} = \ω_{2⋅n}^{-\p{i+n}^2} = \ω_{2⋅n}^{-i^2-n^2 - 2⋅i⋅n} = \ω_{2⋅n}^{-i^2}⋅\ω_{2⋅n}^{-n^2}⋅\ω_{2⋅n}^{-2⋅i⋅n} = c_i ⋅ \ω_{2}^{-n} = \p{-1}^n ⋅ a_i $$

If $n$ is even, the sequence $c$ is cyclic, if $n$ is odd it is negacyclic.

TODO. The sign can be absorbed in $\ω_{2⋅n}$. See HH19 p 13.

This is Bluestein's algorithm

Pointwise multiply by $\ω_{2⋅n}^{i^2}$.
Convolve with $\ω_{2⋅n}^{-i^2}$.
Pointwise multiply by $\ω_{2⋅n}^{i^2}$.

Q. Can we use a field extension to add $\ω_{2⋅n}$ if it does not exist? i.e. work in $\F[X]/(X^2-\ω_n)$. Then substitute $X$ for $\ω_{2⋅n}$.

$$ \begin{aligned} a_i &= X^{i^2} \end{aligned} $$

$X^0, X^1, X^4, X^9, X^{16}$.

Rader's NTT

If $n$ is prime then the non-zero numbers $[1,n)$ form a group under multiplication modulo $n$. Pick a generator $g$ for this group, then $g^{-1}$ is also a generator. We can now rewrite the non-zero indices as

$$ \begin{aligned} i &= \mod{g^{i'}}_n &\hspace{2em} k = \mod{g^{-k'}}_n \end{aligned} $$

To use this we first split the zero indices out of the NTT equation

$$ \begin{aligned} f_0 &= \sum_{i∈[0,n)} x_i &\hspace{2em} f_i &= x_0 + \sum_{i∈[1,n)} x_i ⋅ \ω_n^{i⋅k} \end{aligned} $$

and substitute our index map to get

$$ \begin{aligned} f_{\mod{g^{-k'}}_n} = x_0 + \sum_{i'∈[1,n)} x_{\mod{g^{i'}}_n} ⋅ \ω_n^{g^{i'-k'}} \end{aligned} $$

The sum has the form $\sum_{i∈[0,n-1)} a_{i} ⋅ b_{i-k}$ which is a convolution of size $n-1$. The $b$ term is cyclic, so this is a cyclic convolution. This method for computing NTTs is called Rader's algorithm. Much like Good-Thomas, it uses primality and clever permutation to avoid twiddle factors.

Rader-Winograd NTT

If $n = p^m$ with odd prime $p$ then the elements not a multiple of $p$, i.e. $\mod{i}_p>0$, form a group of size $p^m -p^{m-1}$ elements for which we can find a generator $g$.

TODO

https://encyclopediaofmath.org/wiki/Winograd_Fourier_transform_algorithm

Binary Rader-Winograd NTT

If $n = 2^m$ then the elements not a multiple of $p$, i.e. $\mod{i}_p>0$, form a group of size $p^m -p^{m-1}$ elements for which we can find a generator $g$.

Cyclotomic Fast Fourier transforms

https://en.wikipedia.org/wiki/Cyclotomic_fast_Fourier_transform

Bruun's NTT

https://en.wikipedia.org/wiki/Bruun%27s_FFT_algorithm

Johnson–Burrus fast Fourier transform

See [Bla10] 12.5. Generalization over Good-Thomas and Winograd nesting.

Summary of $\NTT_n$ algorithms

All presented methods require the $\ω_n$ to exist and turn the $\NTT_n$ into a simpler sub-problem. Here $\NTT$ is a number theoretic transform, $\CCV$ is a cyclic convolution, $\CCV^{-}$ is a negacyclic convolution, $\A$, $\M$ are additions and multiplications in $\F$, and $\Ac$, $\Mc$ the same but by a constant.

Algorithm	Requirements	Sub problems
Direct		$n^2⋅\Mc + n⋅\p{n-1}⋅\A$
Cooley-Tukey	$n= a⋅b$	$b⋅\NTT_a + a⋅\NTT_b + n⋅\Mc$
Good-Thomas	$n=a⋅b$, $\gcd(a,b)=1$	$b⋅\NTT_a + a⋅\NTT_b$
Bluestein	$\ω_{2⋅n}$ exists	$\CCV_n^{\p{-1}^n} + 2⋅n⋅\Mc$
Rader	$n$ prime	$\CCV_{n-1} + \p{2⋅n-1}⋅\A$
Rader-Winograd	$n=p^m$, odd prime $p$	$\CCV_{p^m - p^{m−1}} + \p{p^{m-1}+1}⋅\CCV_{p-1}$
Binary Rader-Winograd	$n=2^m$	$2⋅\CCV_{2^{m−2}}$

Note: The $\Mc$ values are deceptive. Often these can be efficiently folded into sub problems, or they are special values in $\F$ that are easier to compute. The $\A$'s in Radercan also be folded into some of the

Synthetic roots of unity

For a given finite field $\F_q$ a primitive $n$-th root of unity exists iff $n$ divides $q-1$. This also holds true for extension fields.

Suppose we want to use Bluesteins algorithm and we have $\ω_n$ but not the required $\ω_{2⋅n}$. We can work around this by taking the extension field $\F_{q^2} = \F_q[X]/(X^2 - \ω_n)$ and using $\ω_{2⋅n} = X$.

In prime fields $\F_p$ with $\mod{p}_4 = 3$ the polynomial $X^2+1$ is irreducible and we can consider the field extension $\F_{p^2} = \F_p[X]/(X^2 + 1)$. We have $q^2 - 1 = \p{q+1}⋅\p{q-1}$. So the new field must have a primitive root of order $q+1$. [Hab23]

What if we want to do an NTT of size $n$, but don't have $n$-th roots of unity in the field? If out field is $\R$ we have at most a 2-nd root of unity, $-1$. If we want more we need an extension $\R/(X^2 + 1)$. More commonly known as the complex numbers $\mathbb{C}$ with $\i^2 = -1$.

$$ \ω_n = \mathrm{e}^{\frac{2⋅\mathrm{π}⋅\mathrm{i}}{n}} = 1 + \frac{2⋅\mathrm{π}}{n}⋅\i -\frac{1}{2}⋅\p{\frac{2⋅\mathrm{π}}{n}}^2 -\frac{1}{6}⋅\p{\frac{2⋅\mathrm{π}}{n}}^3⋅\i +\cdots $$

By definition $\ω_n^n = 1$, so to synthetically construct one we consider $\F[X]/\p{X^n - 1}$. But $X^n-1$ is not necessarily ireducable.

NTT by Complex Field Extension

If the $X^2 + 1$ is an irreducable polynomial in $𝔽_p$ then we can define the complex extension field of $𝔽_p$ as

$$ ℂ(𝔽_p) = 𝔽_p[X]/(X^2 + 1) $$

Note that the polynomial is irreducable exactly when $\mod{p}_4 = 3$.

The polynomials $a + b⋅X$ in this extension field have the property that $X$ acts as $\sqrt{-1}$. So we will adopt the complex number notation $\i$ for $X$ and write $a + b ⋅ i$ instead.

For a finite field $\F_q$ Roots of unity exist for every divisor of $q-1$. In a quadratic field extension $q-1=p^2-1 = \p{p-1}⋅\p{p+1}$. So the we have all the original divisors, plus what

NTT by Elliptic Curves

https://solvable.group/posts/ecfft/

Finding curves SS14.

Find an elliptic curve over $\F$ with

Take a subgroup of the elliptic curve of desired size $n$. Take as $x$ coordinates the $x$ coordinates of those points. We will interpolate a polynomial on those coordinates.

Next we'll need an isogeny to half the elliptic curve size. An isogeny is a surjective morphism between two elliptic curves. If the curves are of the form $y^2 = f(x)$ they can always be written in the standard affine form:

$$ ψ(x,y) = \p{\frac{u(x)}{v(x)}, \frac{s(x)}{t(x)}⋅y} $$

where $u,v$ and $s,t$ are coprime.

The groups $\Z_n^\times$.

The multiplicative group of numbers modulo $n$, $\Z_n^\times$, is cyclic iff $n$ is $1,2,4,p^k,2⋅p^k$ for odd prime $p$. In this case $\Z_n^\times ≅ \mathrm{C}_{\mathrm{φ}(n)}$ where $\mathrm{φ}$ is Euleur's totient function. For powers of two $\Z_{2^k}^\times ≅ \mathrm{C}_{\mathrm{φ}(n)}$

The Chinese Remainder Theorem

Let I and J be ideals of a ring R. Remainder arithmetic modulo I and J means mapping R/IJ to (R/I) × (R/J) by z 7 → (z mod I, z mod J).

The Chinese remainder theorem states that R/IJ is isomorphic to (R/I)×(R/J) if I and J are coprime, i.e., if there exist u ∈ I and v ∈ J with u + v = 1. The inverse map is (x, y) 7 → vx + uy.

Applied to a modular basis of polynomials $X - x_i$ this is called evaluation and interpolation.

Applied to a modular basis of polynomials $X - \ω_n^i$ this is the $\NTT$.

Karatsuba's trick is $\F[X]/(X^2 - X) ≅ \F[X]/(X - 1)×\F[X]/X$

References

R. Tolimieri, M. An, C. Lu, "Algorithms for discrete Fourier transform and convolution", Springer (1989) pp. 1997.
D.P. Kolba, T.W. Parks, "Prime factor FFT algorithm using high speed convolution" IEEE Trans. Acoustics, Speech and Signal Processing, ASSP–25 (1977) pp. 281–294
Blahut (2010). "Fast algorithms for signal processing".

C. S. Burrus (2012). “Fast Fourier Transforms”. https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Signal_Processing_and_Modeling/Fast_Fourier_Transforms_(Burrus)
[46,47] C. S. Burrus (1977). “Index mappings for multidimensional formulation of the DFT and convolution.” https://doi.org/10.1109/TASSP.1977.1162938

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=12550db10683005f3f612f9b7c736f52a30279fe

https://cr.yp.to/lineartime/multapps-20080515.pdf Mostly about fast multiplication and related algorithms.

https://cr.yp.to/arith/tangentfft-20070919.pdf The Tangent FFT. Goes further than split-Radix. Not sure how it generalizes to fields.

https://cr.yp.to/papers/m3-20010811-retypeset-20220327.pdf

https://en.wikipedia.org/wiki/Multiplicative_group_of_integers_modulo_n

TODO. Nussbaumer–Quandalle permutation algorithm. See [Bla10] p. 411.